Model Selection

Low-resource deployment

# Low-resource deployment

Diffucoder 7B Cpgrpo 8bit

DiffuCoder-7B-cpGRPO-8bit is a code generation model converted to MLX format, based on apple/DiffuCoder-7B-cpGRPO, and is specifically designed to provide developers with an efficient code generation tool.

Large Language Model Other

Unireason Qwen3 14B RL GGUF

A static quantization version of UniReason-Qwen3-14B-RL, suitable for text generation and mathematical reasoning research scenarios.

Large Language Model

Transformers English

Gemma 3n E2B GGUF

A static quantized version of the Google Gemma-3n-E2B model, offering various quantization types to balance model size and performance.

Large Language Model

Transformers English

Delta Vector Austral 70B Winton GGUF

This is a quantized version of the Austral-70B-Winton model by Delta-Vector. Through quantization technology, it reduces the storage and computational resource requirements of the model while maintaining good performance, making it suitable for scenarios with limited resources.

Large Language Model English

Gama 12b I1 GGUF

A quantized version of Gama-12B, providing files of various quantization types, suitable for text generation tasks and supporting English and Portuguese.

Large Language Model

Transformers Supports Multiple Languages

Gama-12B is a large language model supporting multiple languages, offering various quantized versions to meet different performance and precision requirements.

Large Language Model

Transformers Supports Multiple Languages

Longwriter Zero 32B I1 GGUF

The LongWriter-Zero-32B quantized model is based on the THU-KEG/LongWriter-Zero-32B base model, supports both Chinese and English, and is suitable for long context scenarios such as reinforcement learning and writing.

Large Language Model

Transformers Supports Multiple Languages

Skywork Skywork SWE 32B GGUF

Skywork-SWE-32B is a large language model with 32B parameters. It is quantized by Llamacpp imatrix and can run efficiently in resource-constrained environments.

Large Language Model

Nvidia AceReason Nemotron 1.1 7B GGUF

This is a quantized version of the NVIDIA AceReason - Nemotron - 1.1 - 7B model, which optimizes the model's running efficiency on different hardware while maintaining certain performance and quality.

Large Language Model Supports Multiple Languages

Openbuddy OpenBuddy R1 0528 Distill Qwen3 32B Preview0 QAT GGUF

This is the quantized version of OpenBuddy-R1-0528-Distill-Qwen3-32B-Preview0-QAT. With quantization technology, the model can run more efficiently under different hardware conditions.

Large Language Model Supports Multiple Languages

Qwen3 Embedding 0.6B Onnx Uint8

This is a quantized model based on ONNX, which is the uint8 quantized version of Qwen/Qwen3-Embedding-0.6B. It reduces the model size while maintaining retrieval performance.

Wan2.1 T2V 14B FusionX VACE GGUF

This is a text-to-video quantization model that undergoes quantization conversion based on a specific base model and supports various video generation tasks.

Text-to-Video English

Wan2.1 T2V 14B FusionX GGUF

This is a quantized text-to-video model that converts the base model to the GGUF format and can be used in ComfyUI, providing more options for text-to-video generation.

Text-to-Video English

Deepseek R1 0528 Qwen3 8B 6bit

A 6-bit quantized version converted from the DeepSeek-R1-0528-Qwen3-8B model, suitable for text generation tasks in the MLX framework.

Large Language Model

Blitzar Coder 4B F.1 GGUF

Blitzar-Coder-4B-F.1 is an efficient multilingual coding model fine-tuned based on Qwen3-4B, supporting more than 10 programming languages and having excellent code generation, debugging, and reasoning capabilities.

Large Language Model

Echelon AI Med Qwen2 7B GGUF

This project provides the GGUF quantized file for the Echelon-AI/Med-Qwen2-7B model, supported by Featherless AI, aiming to enhance model performance and reduce operating costs.

Large Language Model

featherless-ai-quants

Gemma 3n E4B It

Gemma 3n is a lightweight and state-of-the-art open-source multimodal model family launched by Google. It is built on the same research and technology as the Gemini model and supports text, audio, and visual inputs.

Bielik 11B V2.6 Instruct GGUF

Bielik-11B-v2.6-Instruct is a large Polish language model developed by SpeakLeash and ACK Cyfronet AGH, fine-tuned based on Bielik-11B-v2, suitable for instruction following tasks.

Large Language Model

Phi 3.5 Mini Instruct

Phi-3.5-mini-instruct is a lightweight and advanced open-source model built on the dataset used by Phi-3, focusing on high-quality, inference-rich data. It supports a 128K token context length and has powerful multilingual and long-context processing capabilities.

Large Language Model

Transformers Other

Deepseek R1 0528 GGUF

A quantized model based on DeepSeek-R1-0528, focusing on text generation tasks and providing a more efficient way of use.

Large Language Model

lmstudio-community

Infly Inf O1 Pi0 GGUF

A quantized version based on the infly/inf-o1-pi0 model, supporting multilingual text generation tasks, optimized with llama.cpp's imatrix quantization.

Large Language Model Supports Multiple Languages

Medgemma 4b It GGUF

medgemma-4b-it is a multimodal model focused on the medical field, capable of processing image and text inputs, and suitable for multiple medical scenarios such as radiology and clinical reasoning.

Devstral Small 2505 4bit DWQ

This is a 4-bit quantized language model in MLX format, suitable for text generation tasks.

Large Language Model Supports Multiple Languages

Facebook KernelLLM GGUF

KernelLLM is a large language model developed by Facebook. This version is quantized using the llama.cpp tool with imatrix, offering multiple quantization options to suit different hardware requirements.

Large Language Model

Verireason Qwen2.5 1.5B Grpo Small GGUF

This is the statically quantized version of the Nellyw888/VeriReason-Qwen2.5-1.5B-grpo-small model, focusing on Verilog code generation and reasoning tasks.

Large Language Model English

A M Team AM Thinking V1 GGUF

Llamacpp imatrix quantized version based on a-m-team/AM-Thinking-v1 model, supporting multiple quantization types, suitable for text generation tasks.

Large Language Model

Qwen3 0.6B Llamafile

Qwen3 is the latest generation of large language models in the Qwen series, offering a dense model with 0.6B parameters, achieving breakthrough progress in reasoning, instruction following, agent capabilities, and multilingual support.

Large Language Model

Thedrummer Rivermind Lux 12B V1 GGUF

This is a 12B-parameter large language model, processed with llama.cpp's imatrix quantization, offering multiple quantized versions to accommodate different hardware requirements.

Large Language Model

Gryphe Pantheon Proto RP 1.8 30B A3B GGUF

This is a quantized version based on the Gryphe/Pantheon-Proto-RP-1.8-30B-A3B model, using llama.cpp for quantization, suitable for role-playing and text generation tasks.

Large Language Model English

Qwen3 30B A3B 4bit DWQ 05082025

This is a 4-bit quantized model converted from Qwen/Qwen3-30B-A3B to MLX format, suitable for text generation tasks.

Large Language Model

Bielik 1.5B V3.0 Instruct GGUF

This is a 1.5B parameter instruction fine-tuned model for Polish, developed based on the SpeakLeash Bielik series, suitable for text generation tasks.

Large Language Model Other

Microsoft Phi 4 Reasoning Plus GGUF

The quantized version of Microsoft Phi-4-reasoning-plus, suitable for efficient text generation tasks on devices with limited resources.

Large Language Model Supports Multiple Languages

Muyan TTS Q8 0 GGUF

Muyan-TTS is a text-to-speech (TTS) model, converted to GGUF format for use with llama.cpp.

Speech Synthesis

Mlabonne Qwen3 14B Abliterated GGUF

This is the quantized version of the Qwen3-14B-abliterated model, quantized using llama.cpp's imatrix option, suitable for text generation tasks.

Large Language Model

Qwen Qwen3 0.6B GGUF

This repository contains GGUF format model files for Qwen/Qwen3-0.6B, quantized by TensorBlock's machines and compatible with llama.cpp.

Large Language Model

Llamaestra 3.2 1B Translation GGUF

A 1B-parameter language model specializing in English and Italian translation, offering multiple quantized versions in GGUF format.

Machine Translation Supports Multiple Languages

Qwen2.5 7B Instruct GGUF Llamafile

Qwen2.5 is the latest series of the Tongyi Qianwen large model, including base models and instruction-tuned models with parameter scales ranging from 0.5B to 72B, showing significant improvements in areas such as code, mathematics, instruction following, and long text generation.

Large Language Model English

Qwen2-96M is a miniature language model based on the Qwen2 architecture, containing 96 million parameters and supporting a context length of 8192 tokens, suitable for English text generation tasks.

Large Language Model English

Smolvlm2 2.2B Instruct I1 GGUF

SmolVLM2-2.2B-Instruct is a vision-language model with a parameter scale of 2.2B, focusing on video text-to-text tasks and supporting English.

Ritrieve Zh V1 GGUF

This project provides a static quantized version of the richinfoai/ritrieve_zh_v1 model. Through quantization, it reduces storage space and computational resource requirements while maintaining certain performance.

Large Language Model

Transformers Chinese

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase